智能论文笔记

Efficient Long Sequence Modeling via State Space Augmented Transformer

Simiao Zuo , Xiaodong Liu , Jian Jiao , Denis Charles , Eren Manavoglu , Tuo Zhao , Jianfeng Gao

分类：自然语言处理 | 机器学习

2022-12-15

Transformer models have achieved superior performance in various natural language processing tasks. However, the quadratic computational cost of the attention mechanism limits its practicality for long sequences. There are existing attention variants that improve the computational efficiency, but they have limited ability to effectively compute global information. In parallel to Transformer models, state space models (SSMs) are tailored for long sequences, but they are not flexible enough to capture complicated local information. We propose SPADE, short for $\underline{\textbf{S}}$tate s$\underline{\textbf{P}}$ace $\underline{\textbf{A}}$ugmente$\underline{\textbf{D}}$ Transform$\underline{\textbf{E}}$r. Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers. The SSM augments global information, which complements the lack of long-range dependency issue in local attention methods. Experimental results on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method. To further demonstrate the scalability of SPADE, we pre-train large encoder-decoder models and present fine-tuning results on natural language understanding and natural language generation tasks.

translated by 谷歌翻译

High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization

Jiahui Cheng , Minshuo Chen , Hao Liu , Tuo Zhao , Wenjing Liao

分类：机器学习

2022-12-01

Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.

translated by 谷歌翻译

Differentially Private Estimation of Hawkes Process

Simiao Zuo , Tianyi Liu , Tuo Zhao , Hongyuan Zha

分类：机器学习 | (统计)机器学习

2022-09-15

点过程模型在现实世界应用中非常重要。在某些关键应用程序中，对点过程模型的估计涉及来自用户的大量敏感个人数据。隐私问题自然出现了现有文献中未解决的问题。为了弥合这一明显的差距，我们提出了第一个针对点过程模型的第一个一般差异私人估计程序。具体来说，我们以霍克斯的流程为例，并根据霍克斯流程的离散表示，为事件流数据引入了严格的差异隐私定义。然后，我们提出了两种差异性优化算法，可以有效地估算霍克斯流程模型，并在两个不同的设置下具有所需的隐私和公用事业保证。提供实验以支持我们的理论分析。

translated by 谷歌翻译

Adaptively-weighted Integral Space for Fast Multiview Clustering

Man-Sheng Chen , Tuo Liu , Chang-Dong Wang , Dong Huang , Jian-Huang Lai

分类：机器学习 | 人工智能

2022-08-25

多视图聚类已进行了广泛的研究，以利用多源信息来提高聚类性能。通常，大多数现有作品通常通过某些相似性/距离指标（例如欧几里得距离）或学习的表示形式来计算N * n亲和力图，并探索跨视图的成对相关性。但是不幸的是，通常需要二次甚至立方复杂性，这使得在聚集largescale数据集方面遇到了困难。最近，通过选择具有K-均值的视图锚表演或通过对原始观测值进行直接矩阵分解来捕获多个视图中的数据分布。尽管取得了巨大的成功，但很少有人考虑了视图不足问题，因此隐含地认为，每个单独的观点都足以恢复群集结构。此外，无法同时发现潜在积分空间以及来自多个视图的共享群集结构。鉴于这一点，我们为快速多视图聚类（AIMC）提出了一个具有几乎线性复杂性的快速多视图聚类（AIMC）。具体而言，视图生成模型旨在重建来自潜在积分空间的视图观测值，并具有不同的适应性贡献。同时，具有正交性约束和群集分区的质心表示无缝构造以近似潜在的积分空间。开发了一种替代最小化算法来解决优化问题，事实证明，该问题具有线性时间复杂性W.R.T.样本量。与最新方法相比，在几个Realworld数据集上进行的广泛实验证实了所提出的AIMC方法的优越性。

translated by 谷歌翻译

Rectify ViT Shortcut Learning by Visual Saliency

Chong Ma , Lin Zhao , Yuzhong Chen , David Weizhong Liu , Xi Jiang , Tuo Zhang , Xintao Hu , Dinggang Shen , Dajiang Zhu , Tianming Liu

分类：计算机视觉

2022-06-17

快捷方式学习对深度学习模型很常见，但导致了退化的特征表示形式，因此危害了该模型的可推广性和解释性。但是，在广泛使用的视觉变压器框架中的快捷方式学习在很大程度上是未知的。同时，引入特定领域的知识是纠正捷径的主要方法，捷径为背景相关因素。例如，在医学成像领域中，放射科医生的眼睛凝视数据是一种有效的人类视觉先验知识，具有指导深度学习模型的巨大潜力，可以专注于有意义的前景区域。但是，获得眼睛凝视数据是时必的，劳动密集型的，有时甚至是不切实际的。在这项工作中，我们提出了一种新颖而有效的显着性视觉变压器（SGT）模型，以在没有眼神数据的情况下在VIT中纠正快捷方式学习。具体而言，采用计算视觉显着性模型来预测输入图像样本的显着性图。然后，显着图用于散布最有用的图像贴片。在拟议的中士中，图像贴片之间的自我注意力仅集中于蒸馏的信息。考虑到这种蒸馏操作可能会导致全局信息丢失，我们在最后一个编码器层中进一步介绍了一个残留的连接，该连接捕获了所有图像贴片中的自我注意力。四个独立公共数据集的实验结果表明，我们的SGT框架可以有效地学习和利用人类的先验知识，而无需眼睛凝视数据，并且比基线更好。同时，它成功地纠正了有害的快捷方式学习并显着提高了VIT模型的解释性，证明了传递人类先验知识在纠正快捷方式学习方面传递人类先验知识的承诺

translated by 谷歌翻译

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Hao Liu , Minshuo Chen , Siawpeng Er , Wenjing Liao , Tong Zhang , Tuo Zhao

分类： (统计)机器学习 | 机器学习

2022-06-09

过度参数化的神经网络在复杂数据上具有很大的代表能力，更重要的是产生足够平滑的输出，这对于它们的概括和稳健性至关重要。大多数现有函数近似理论表明，使用足够多的参数，神经网络可以很好地近似于功能值的某些类别的函数。然而，神经网络本身可能是高度平滑的。为了弥合这一差距，我们以卷积残留网络（Rescresnets）为例，并证明大型响应不仅可以在功能值方面近似目标函数，而且还可以表现出足够的一阶平滑度。此外，我们将理论扩展到在低维歧管上支持的近似功能。我们的理论部分证明了在实践中使用深层网络的好处。提供了关于对抗性鲁棒图像分类的数值实验，以支持我们的理论。

translated by 谷歌翻译

Deep Learning Assisted End-to-End Synthesis of mm-Wave Passive Networks with 3D EM Structures: A Study on A Transformer-Based Matching Network

Siawpeng Er , Edward Liu , Minshuo Chen , Yan Li , Yuqi Liu , Tuo Zhao , Hua Wang

分类：机器学习

2022-01-06

本文提出了一种深入学习辅助合成方法，用于使用3D EM结构的RF / MM波被动匹配网络直接端到端生成。与从目标电路分量值和目标拓扑结构合成EM结构的现有方法不同，我们所提出的方法实现了从所需性能值的网络拓扑到输入的网络拓扑的直接合成。我们在片上1：1个变压器的阻抗匹配网络上展示所提出的合成神经网络（NN）模型。通过利用参数共享，综合NN模型成功提取了输入阻抗和负载电容器的相关特征，并在45nm的SOI进程中预测了变压器3D EM几何体，该过程将与标准50 $ \ Omega $负载匹配目标输入阻抗吸收两个装载电容器。作为概念验证，合成了几个示例变压器几何形状，并在ANSYS HFS中验证以提供所需的输入阻抗。

translated by 谷歌翻译

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Hao Liu , Haizhao Yang , Minshuo Chen , Tuo Zhao , Wenjing Liao

分类： (统计)机器学习 | 机器学习

2022-01-01

无限尺寸空间之间的学习运营商是机器学习，成像科学，数学建模和仿真等广泛应用中出现的重要学习任务。本文研究了利用深神经网络的Lipschitz运营商的非参数估计。 Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class.在假设目标操作员表现出低维结构的情况下，由于训练样本大小增加，我们的误差界限衰减，根据我们估计中的内在尺寸，具有吸引力的快速速度。我们的假设涵盖了实际应用中的大多数情况，我们的结果通过利用操作员估算中的低维结构来产生快速速率。我们还研究了网络结构（例如，网络宽度，深度和稀疏性）对神经网络估计器的泛化误差的影响，并提出了对网络结构的选择来定量地最大化学习效率的一般建议。

translated by 谷歌翻译

NeuralRoom: Geometry-Constrained Neural Implicit Surfaces for Indoor Scene Reconstruction

Yusen Wang , Zongcheng Li , Yu Jiang , Kaixuan Zhou , Tuo Cao , Yanping Fu , Chunxia Xiao

分类：计算机视觉

2022-10-13

We present a novel neural surface reconstruction method called NeuralRoom for reconstructing room-sized indoor scenes directly from a set of 2D images. Recently, implicit neural representations have become a promising way to reconstruct surfaces from multiview images due to their high-quality results and simplicity. However, implicit neural representations usually cannot reconstruct indoor scenes well because they suffer severe shape-radiance ambiguity. We assume that the indoor scene consists of texture-rich and flat texture-less regions. In texture-rich regions, the multiview stereo can obtain accurate results. In the flat area, normal estimation networks usually obtain a good normal estimation. Based on the above observations, we reduce the possible spatial variation range of implicit neural surfaces by reliable geometric priors to alleviate shape-radiance ambiguity. Specifically, we use multiview stereo results to limit the NeuralRoom optimization space and then use reliable geometric priors to guide NeuralRoom training. Then the NeuralRoom would produce a neural scene representation that can render an image consistent with the input training images. In addition, we propose a smoothing method called perturbation-residual restrictions to improve the accuracy and completeness of the flat region, which assumes that the sampling points in a local surface should have the same normal and similar distance to the observation center. Experiments on the ScanNet dataset show that our method can reconstruct the texture-less area of indoor scenes while maintaining the accuracy of detail. We also apply NeuralRoom to more advanced multiview reconstruction algorithms and significantly improve their reconstruction quality.

translated by 谷歌翻译

First-order Policy Optimization for Robust Markov Decision Process

Yan Li , Tuo Zhao , Guanghui Lan

分类：机器学习 | 人工智能

2022-09-21

我们考虑解决强大的马尔可夫决策过程（MDP）的问题，该过程涉及一组折扣，有限状态，有限的动作空间MDP，具有不确定的过渡核。计划的目的是找到一项强大的政策，以优化针对过渡不确定性的最坏情况值，从而将标准MDP计划作为特殊情况。对于$（\ Mathbf {s}，\ Mathbf {a}）$ - 矩形不确定性集，我们开发了一种基于策略的一阶方法，即稳健的策略镜像下降（RPMD），并建立$ \ Mathcal {o }（\ log（1/\ epsilon））$和$ \ Mathcal {o}（1/\ epsilon）$迭代复杂性，用于查找$ \ epsilon $ -optimal策略，并带有两个增加的步骤式方案。 RPMD的先前收敛适用于任何Bregman差异，前提是政策空间在以初始政策为中心时通过差异测量的半径限制了半径。此外，当布雷格曼的分歧对应于平方的欧几里得距离时，我们建立了一个$ \ mathcal {o}（\ max \ {1/\ epsilon，1/（\ eta \ eTa \ epsilon^2）\ epsilon^2）\任何常量的步进$ \ eta $。对于Bregman差异的一般类别，如果不确定性集满足相对强的凸度，则还为RPMD建立了类似的复杂性。当仅通过与名义环境的在线互动获得一阶信息时，我们进一步开发了一个名为SRPMD的随机变体。对于Bregman General Divergences，我们建立了一个$ \ MATHCAL {O}（1/\ Epsilon^2）$和$ \ Mathcal {O}（1/\ Epsilon^3）$样品复杂性，具有两个增加的静态方案。对于Euclidean Bregman Divergence，我们建立了一个$ \ MATHCAL {O}（1/\ Epsilon^3）$样本复杂性，并具有恒定的步骤。据我们所知，所有上述结果似乎是应用于强大的MDP问题的基于策略的一阶方法的新事物。

translated by 谷歌翻译